Multimodal emotion recognition in audiovisual communication
نویسندگان
چکیده
This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods base on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods base on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.
منابع مشابه
MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016
Emotion recognition is a significant research filed of pattern recog‐ nition and artificial intelligence. The Multimodal Emotion Recognition Challenge (MEC) is a part of the 2016 Chinese Conference on Pattern Recognition (CCPR). The goal of this competition is to compare multimedia processing and machine learning methods for multimodal emotion recognition. The challenge also aims to provide a c...
متن کاملPrediction of asynchronous dimensional emotion ratings from audiovisual and physiological data
Automatic emotion recognition systems based on supervised machine learning require reliable annotation of a↵ective behaviours to build useful models. Whereas the dimensional approach is getting more and more popular for rating a↵ective behaviours in continuous time domains, e. g., arousal and valence, methodologies to take into account reaction lags of the human raters are still rare. We theref...
متن کاملContext-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling
In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long ShortTerm Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimen...
متن کاملEmotional pictures and sounds: a review of multimodal interactions of emotion cues in multiple domains
In everyday life, multiple sensory channels jointly trigger emotional experiences and one channel may alter processing in another channel. For example, seeing an emotional facial expression and hearing the voice's emotional tone will jointly create the emotional experience. This example, where auditory and visual input is related to social communication, has gained considerable attention by res...
متن کاملSpeaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition
Conversational engagement is a multimodal phenomenon and an essential cue to assess both human-human and human-robot communication. Speaker-dependent and speaker-independent scenarios were addressed in our engagement study. Handcrafted audio-visual features were used. Fixed window sizes for feature fusion method were analysed. Novel dynamic window size selection and multimodal bi-directional lo...
متن کامل